This can help us to easily identify rollup rows at different levels and then to create friendly labels at each level:
SELECTIF(GROUPING(vendor_state) =1, 'Vendor_State_Count', vendor_state) AS vendor_state,IF(GROUPING(vendor_city) =1, 'Vendor_City_Count', vendor_city) AS vendor_city,COUNT(*) AS qty_vendorsFROM ap.vendorsWHERE vendor_state IN ('IA', 'NJ')GROUPBY vendor_state, vendor_city WITHROLLUP
8 records
vendor_state
vendor_city
qty_vendors
IA
Fairfield
1
IA
Washington
1
IA
Vendor_City_Count
2
NJ
East Brunswick
2
NJ
Fairfield
1
NJ
Washington
1
NJ
Vendor_City_Count
4
Vendor_State_Count
Vendor_City_Count
6
If you want to see rollup rows only:
SELECTIF(GROUPING(vendor_state) =1, 'Vendor_State_Count', vendor_state) AS vendor_state,IF(GROUPING(vendor_city) =1, 'Vendor_City_Count', vendor_city) AS vendor_city,COUNT(*) AS qty_vendorsFROM ap.vendorsWHERE vendor_state IN ('IA', 'NJ')GROUPBY vendor_state, vendor_city WITHROLLUPHAVINGGROUPING(vendor_state) =1ORGROUPING(vendor_city) =1
3 records
vendor_state
vendor_city
qty_vendors
IA
Vendor_City_Count
2
NJ
Vendor_City_Count
4
Vendor_State_Count
Vendor_City_Count
6
3.4 Window: OVER(), PARTITION BY
What if you would like to create grouping effect but you don’t want to lose any granularity of data? This is where “window” concept cuts in. OVER() function creates a window, with PARTITION BY specifies the “grouping effect” for the windows to be created.
The following example creates a sum of invoice_total at different levels. Notice that:
total_invoices: created with nothing in OVER(), it will just create a column of sum value
vendor_total: created with PARTITION BY in OVER(), it will create subtotals by each value of vendor_id
you don’t lose any granularity of the data, you can still choose all the columns from invoices table that you wish to include.
you can still use WHERE to subset for target table.
SELECT vendor_id, invoice_date, invoice_total,SUM(invoice_total) OVER() AS total_invoices,SUM(invoice_total) OVER(PARTITIONBY vendor_id) AS vendor_totalFROM ap.invoicesWHERE invoice_total >5000
8 records
vendor_id
invoice_date
invoice_total
total_invoices
vendor_total
72
2022-06-01
21842.00
155800
21842.00
99
2022-06-18
6940.25
155800
6940.25
104
2022-05-21
7125.34
155800
7125.34
110
2022-05-28
37966.19
155800
119892.41
110
2022-07-19
26881.40
155800
119892.41
110
2022-07-23
20551.18
155800
119892.41
110
2022-07-24
23517.58
155800
119892.41
110
2022-07-31
10976.06
155800
119892.41
Running Windows: OVER(), PARTITION BY, ORDER BY
Comparing with the above, with ORDER BY added, we are now calculating running total for each vendor as each invoice being issued for the same vendor.
“ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW”: determines the running total behaviour. It says operations performed for all rows up to this current row. See window function for vendor_id 110 in action.
SELECT vendor_id, invoice_date, invoice_total,SUM(invoice_total) OVER() AS total_invoices,SUM(invoice_total) OVER(PARTITIONBY vendor_idORDERBY invoice_dateROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW) AS vendor_totalFROM ap.invoicesWHERE invoice_total >5000
8 records
vendor_id
invoice_date
invoice_total
total_invoices
vendor_total
72
2022-06-01
21842.00
155800
21842.00
99
2022-06-18
6940.25
155800
6940.25
104
2022-05-21
7125.34
155800
7125.34
110
2022-05-28
37966.19
155800
37966.19
110
2022-07-19
26881.40
155800
64847.59
110
2022-07-23
20551.18
155800
85398.77
110
2022-07-24
23517.58
155800
108916.35
110
2022-07-31
10976.06
155800
119892.41
You can modify the “width” of the window. See the following:
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: allows you to use 1 row up (if any) and 1 row down (if any) relative to the current row to make calculation. Obviously you can change to any numbers as you wish.
Now it calculates the average invoice total amounts based on rolling 3 invoice_dates.
SELECT vendor_id, invoice_date, invoice_total,SUM(invoice_total) OVER() AS total_invoices,ROUND(AVG(invoice_total) OVER(PARTITIONBY vendor_id ORDERBY invoice_date ROWSBETWEEN1PRECEDINGAND1FOLLOWING), 2) AS vendor_totalFROM ap.invoicesWHERE invoice_total >5000
8 records
vendor_id
invoice_date
invoice_total
total_invoices
vendor_total
72
2022-06-01
21842.00
155800
21842.00
99
2022-06-18
6940.25
155800
6940.25
104
2022-05-21
7125.34
155800
7125.34
110
2022-05-28
37966.19
155800
32423.80
110
2022-07-19
26881.40
155800
28466.26
110
2022-07-23
20551.18
155800
23650.05
110
2022-07-24
23517.58
155800
18348.27
110
2022-07-31
10976.06
155800
17246.82
If you need to modify the window based on range of a value you can use RANGE instead of ROW keyword.
The following specifies the window to be based on invoice_date and treat a day before and a day after as the same range. (See vendor_id 110 and between 2022-07-19 and 2022-07-31):
SELECT vendor_id, invoice_date, invoice_total,SUM(invoice_total) OVER() AS total_invoices,SUM(invoice_total) OVER(PARTITIONBY vendor_id ORDERBY invoice_date RANGEBETWEENINTERVAL1DAYPRECEDINGANDINTERVAL1DAYFOLLOWING) AS vendor_totalFROM ap.invoicesWHERE invoice_total >5000
8 records
vendor_id
invoice_date
invoice_total
total_invoices
vendor_total
72
2022-06-01
21842.00
155800
21842.00
99
2022-06-18
6940.25
155800
6940.25
104
2022-05-21
7125.34
155800
7125.34
110
2022-05-28
37966.19
155800
37966.19
110
2022-07-19
26881.40
155800
26881.40
110
2022-07-23
20551.18
155800
44068.76
110
2022-07-24
23517.58
155800
44068.76
110
2022-07-31
10976.06
155800
10976.06
Create Row Index
The following creates row_number for each group by vendor_state:
The following compares the differences among rankings and row index:
SELECT vendor_id, invoice_total, invoice_number,ROW_NUMBER() OVER (PARTITIONBY vendor_id ORDERBY invoice_total) AS'row_id', RANK() OVER (PARTITIONBY vendor_id ORDERBY invoice_total) AS'rank', DENSE_RANK() OVER (PARTITIONBY vendor_id ORDERBY invoice_total) AS'dense_rank'FROM ap.invoices
Displaying records 1 - 15
vendor_id
invoice_total
invoice_number
row_id
rank
dense_rank
34
116.54
QP58872
1
1
1
34
1083.58
Q545443
2
2
2
37
116.00
547479217
1
1
1
37
224.00
547481328
2
2
2
37
224.00
547480102
3
2
2
48
856.92
P02-88D77S7
1
1
1
72
85.31
39104
1
1
1
72
21842.00
40318
2
2
2
80
90.36
134116
1
1
1
80
175.00
133560
2
2
2
81
936.93
MABO1489
1
1
1
82
600.00
C73-24
1
1
1
83
579.42
31361833
1
1
1
83
1575.00
31359783
2
2
2
86
2433.00
367447
1
1
1
Create Lags and Leads
LAG(): Allows to “push each record down” and lines up the current row value with the value from last row. LEAD(): Allows to “pull each record up” and lines up the current row value with the value from next row.
These are useful to compare results for year over year, month over month etc.
Also notice here that “WINDOW” clause allows to create alias of a particular type of window, so you don’t have to repeat the specifications for OVER() each time.
SELECT rep_id, sales_year,-- lag series sales_total AS current_sales,LAG(sales_total, 1, 0) OVER rep_window AS last_sales, Sales_total -LAG(sales_total, 1, 0) OVER rep_window AS change_from_last,-- lead series sales_total AS current_sales_again,LEAD(sales_total, 1, 0) OVER rep_window AS next_sales,LEAD(sales_total, 1, 0) OVER rep_window - Sales_total AS change_from_currentFROM ex.sales_totalsWINDOW rep_window AS (PARTITIONBY rep_id ORDERBY sales_year)